Journal of Clinical Epidemiology
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
BackgroundThe number of problematic randomized clinical trials (RCTs) has risen sharply in recent decades, posing serious challenges to the integrity of the healthcare evidence ecosystem. ObjectiveTo investigate whether retraction of problematic RCTs could reduce evidence contamination. DesignRetrospective cohort study SettingA secondary analysis of the VITALITY Study database. Participants1,330 retracted RCTs with 847 systematic reviews. MeasurementsThe difference in the median number (and...
Show abstract
BackgroundJournals may respond to integrity concerns by publishing an editorial response (editorial notice, expression of concern (EoC) or retraction). We investigated whether the type of editorial response affected citation rates. MethodsWe obtained citations for 172 randomised controlled trials (RCTs) with integrity concerns (41 had editorial notices, 38 EoCs and 23 retractions) and control RCTs from the same journal and year. Monthly citation rates up to 60 months before and after editorial ...
Show abstract
BackgroundThe ability of large language models (LLMs) to work collaboratively and screen studies in a systematic review (SR) is under-explored. Hence, we aimed to evaluate the effectiveness of LLMs in automating the process of screening in systematic reviews. MethodsThis is an observational study which included labeled data (title and abstracts) for five SRs. Originally, two reviewers screened the citations independently for eligibility. A third reviewer cross-checked each citation for quality ...
Show abstract
The terms personalized, individualized and precision medicine are increasingly used to describe health interventions, yet their operational meaning in clinical research remains unclear. Despite extensive conceptual discussion, there is limited empirical evidence on how these labels are applied in randomized controlled trials (RCTs) and whether such trials meet standards of transparency and methodological rigor. We systematically examined 262 RCTs published between 2020 and 2022 that used the ter...
Show abstract
BackgroundIn pharmacoepidemiological studies, days of treatment (DoT) duration associated with individual electronic drug utilization records (DUR) are usually missing. Researcher-defined duration (RDD) calculation approaches, as opposed to data-driven approaches, can be used to estimate DoT based on the specific choices and assumptions made by investigators. These are usually underreported or even undocumented. We aimed to develop a framework for the standardization of terminology, formulas, im...
Show abstract
Systematic reviews are used in academia, biotechnology, pharmaceutical companies and government to synthesise and appraise large numbers of publications. The current (largely manual) workflow takes an average of 9-18 months1, at a cost of $100,000+ per review2. We built a platform, ScholaraAI, that leverages artificial intelligence to cut this to < 0.1% of the time, without compromising quality. ScholaraAI facilitates end-to-end systematic reviews; search, screening, data extraction, and analysi...
Show abstract
BackgroundRoutinely collected health data are increasingly used to generate real-world evidence for therapeutic decision-making. Yet, stakeholders, including clinicians, pharmaceutical industry representatives, patient advocacy groups, and statisticians, prioritize different aspects of data quality, analysis, and interpretation. Without explicit consideration of these perspectives, analyses risk being fragmented, misaligned with end-user needs, or lacking transparency. MethodsWe developed a sta...
Show abstract
BackgroundSystematic reviews (SRs) are essential for evidence-based medicine but require extensive time and resources for abstract screening. Large language models (LLMs) offer potential for automating this process, yet concerns about data privacy, intellectual property protection, and reproducibility limit the use of cloud-based solutions in research settings. ObjectiveTo evaluate the performance of a locally deployed 20-billion parameter LLM for automated abstract screening in systematic revi...
Show abstract
ImportanceLarge language models (LLMs) offer potential decision support, but their accuracy varies. Prompt engineering can generally enhance LLM behavior in a clinical context, yet best practices have yet to be formally explored in realistic neurology settings. ObjectiveTo evaluate the impact of structured prompting versus simple prompting on the performance of six LLMs (three closed-source: OpenAI GPT-4o, OpenAI o3, OpenAI GPT-5.2 Thinking; three open-source: Meta Llama-4-Scout-17B-16E-Instruc...
Show abstract
Large language models (LLMs) are increasingly transforming scientific workflows, yet their application to rigorous evidence synthesis remains underexplored. Through the execution of a single Python script, we present a fully automated pipeline leveraging the Claude API to generate systematic reviews from literature search through manuscript completion without human intervention. Our pipeline processes hundreds of papers through iterative API calls for inclusion evaluation, information extraction...
Show abstract
BackgroundIn immune checkpoint inhibitor (ICI) trials, overall survival (OS) benefits are well established, yet improvements in quality of life (QoL) are often inconsistent or absent in conventional analyses. This apparent discordance raises important questions: are QoL outcomes truly unrelated to survival, and how can QoL results be better utilized and interpreted? MethodsA model-based meta-analysis (MBMA) of longitudinal EORTC QLQ-C30 global health status/quality of life data from randomized ...
Show abstract
IntroductionSeveral filters are routinely used to remove animal or nonhuman records in Ovid Embase, despite there being no performance data for them. The filters take different approaches in design. ObjectiveTo understand and compare the impact of 11 filters to remove animal or nonhuman records in Ovid Embase. To understand the indexing of relevant subject headings in Embase. MethodsTo assess filter performance, we screened and categorised 3,000 records as should be removed or should be reta...
Show abstract
Background and AimsThe glucagon-like peptide-1 receptor agonist (GLP-1 RA) semaglutide has demonstrated efficacy for the secondary prevention of cardiovascular disease among patients with overweight/obesity without diabetes mellitus. However, the comparative effectiveness of GLP-1 RA versus other antiobesity medications (e.g. phentermine-topiramate) not been evaluated. MethodsThis was a retrospective, observational, cohort study using target trial emulation methodology using the Truveta electro...
Show abstract
BackgroundSystematic reviews are important for informing public health policies and program selection; however, they are time- and resource-intensive. Artificial intelligence (AI) offers a solution to reduce these labour-intensive requirements for various aspects of systematic review production, including data extraction. To date, there is limited robust evidence evaluating the accuracy and efficiency of AI for data extraction. This study within a review (SWAR) aimed to determine whether human d...
Show abstract
BackgroundDelivering timely, high-quality feedback on resident scholarly projects is labour-intensive, especially in large programmes. We developed an AI-assisted evaluation system, powered by the open-weight LLaMA-3.1 large-language model (LLM), to generate formative feedback on Family Medicine residents scholarly projects and compared its performance with expert human evaluators. MethodsWe evaluated whether the AI-generated feedback achieves comparable quality to expert feedback. The tool ing...
Show abstract
BackgroundNo randomized clinical trial comparing the most established new modalities of treatment for patients with localized prostate cancer has been published, and there is scarce comparative effectiveness research assessing Patient-Reported Outcome Measures (PROMs). Objectiveto compare the impact of active surveillance, robot-assisted radical prostatectomy (RARP), Intensity-modulated radiotherapy (IMRT), and real-time brachytherapy on patients, through PROMs, from pre-treatment to five years...
Show abstract
ObjectivesGrowth Mindset and Grit have been proposed as key psychological resources for resilience and adaptation, yet their manifestation and social distribution in later life remain underexplored. This study examines the structure, distribution, and correlates of Growth Mindset and Grit in older adulthood using proxy indicators in the English Longitudinal Study of Ageing (ELSA). MethodsProxy indicators reflecting learning behaviour, personality traits, affect, and beliefs were used to derive ...
Show abstract
BackgroundThe NHS-Galleri trial reported a substantial reduction in Stage IV cancer diagnoses and a four-fold increase in cancer detection rates, but did not meet its primary endpoint of reducing combined Stage III+IV diagnoses in a prespecified group of 12 cancers. We hypothesize that stage slip-- progression of cancers from Stage I/II to Stage III during diagnostic workup--is the primary mechanism behind this statistical masking. MethodsWe developed a Monte Carlo simulation of 142,000 partici...
Show abstract
AbstractAccurate health information is ineffective if patients cannot understand it. Large Language Model (LLM) health research values veridical precision; however, linguistic accessibility remains an under-examined component of output quality and usability. This study investigated two sources of variability in readability classification: differences across LLM systems and across readability metrics. The analysis tested 1,120 data points from seven systems in English and Portuguese, comparing ba...
Show abstract
BackgroundRetrieval-augmented generation (RAG) frameworks such as RAPID [1] have demonstrated that staged planning and retrieval grounding improve long-form text generation. However, most implementations remain similarity-driven and open-domain, lacking the epistemic safeguards required for biomedical synthesis, where mechanistic completeness, temporal governance, traceability, and explicit gap classification are essential. ObjectiveTo develop and evaluate a topology-aware, graph-augmented retr...